Betweenness Centrality in Large Complex Networks 



O 

o 

(N 



S3 



c3 



i 

S3 
O 

o 



> ■ 

\o ■ 
m : 

On ■ 

O ■ 
co : 
o . 



S3 
O 

o 



X 
S3 



Marc Barthelemy 
CEA, Departement de Physique Theorique et Appliquee 
BP12 Bruyeres-Le-Chatel, France 



We analyze the betweenness centrality (BC) of nodes in large complex networks. In general, the 
BC is increasing with connectivity as a power law with an exponent 77. We find that for trees or 
networks with a small loop density rj — 2 while a larger density of loops leads to n < 2. For scale- free 
networks characterized by an exponent 7 which describes the connectivity distribution decay, the 
BC is also distributed according to a power law with a non universal exponent 8. We show that this 
exponent 8 must satisfy the exact bound 8 > (7 + l)/2. If the scale free network is a tree, then we 
have the equality 8 = (7+ l)/2. 



I. INTRODUCTION 

In large complex networks, not all nodes are equiva- 
lent. For example, the removal of a node can have a 
very different effect depending on the node. If the node 
is at a dead-end, its removal will be without any effect 
in contrast with the case of a cut-vertex (the analog of 
a bridge for edges) which removal creates new discon- 
nected components [1,2]. This question of the impor- 
tance of nodes in a network is thus of primary interest 
since it concerns crucial subjects such as networks re- 
silience to attacks [3-5] and also immunization against 
epidemics [6]. In social network analysis, this problem of 
determining the rank — or the "centrality" — of the actors 
according to their position in the social structure was 
studied a long time ago [7,8]. Different quantities were 
then defined in this context of social networks in order 
to quantify this centrality. The simplest proxy for cen- 
trality one could think of is the connectivity. However, 
the inspection of a simple example such as the one in 
Fig. 1 shows that centrality is in general not related to 
connectivity. The reason is that connectivity is a local 
quantity which does not inform about the importance of 
the node in the network. Indeed, the node v in Fig. 1 
has a small connectivity and the effect of its removal is 
not determined by its connectivity but by the fact that 
it links together different parts of the network. A good 
measure of the centrality of a node has thus to incorpo- 
rate a more global information such as its role played in 
the existence of paths between any two given nodes in 
the network. One is thus naturally led to the definition 
of the betweenness centrality (BC) which counts the frac- 
tion of shortest paths going through a given node. More 
precisely, the BC of a node v is given by [7,8] 
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n s to t going through v. In the following we will also 
the pair-dependency defined as [9] 
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The betweenness centrality g scales as the number of 
pairs of nodes (s ^ t ^ v) and some authors rescale 
it by (TV — l)(iV — 2)/2 in order to get a number in the 
interval [0, 1] (N is the number of nodes in the giant com- 
ponent of the network). A naive algorithm for computing 
g would lead to a complexity of order 0(N 3 ) and would 
thus be prohibitive for large networks. Fortunately a 
rapid algorithm was recently proposed [9] which reduces 
the complexity to 0(N 2 ) allowing the computation of the 
centrality for large networks. 




Region Ci Region C 2 

FIG. 1. The node v has a small connectivity (only two 
neighbors) but all shortest paths from region 1 to region 2 
has to go through v which implies a very large centrality. In 
fact, v is here a cut- vertex; its removal will break the network 
into two disconnected components. 

The definition (1) is indeed a good description of cen- 
trality as can be easily seen on the example of figure 1. 
The BC of the node v is given by 



g(v) = 2 
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where a st is the total number of shortest paths from node 
s to node t and a s t(v) is the number of shortest paths 
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where N\ (N2) is the number of nodes in region C\ (C2). 
The first equality comes from the fact that the term for 
which s and t are in the same region does not contribute 
since in this case a s t(v) = 0. This result shows that 
although v has a small connectivity, its BC defined by 
(1) is large as intuitively expected. This little argument 
prefigures the more general one about ccntrality for trees 
(see below). 

High values of the centrality thus indicate that a node 
can reach the others on short paths or that this vertex 
lies on many short paths. If one removes a node with 
large centrality it will lengthen the paths between many 
pairs of nodes. The extreme case is when the node is 
a cut-vertex [1,2] and its removal creates new connected 
components. This was for example used in [10] to deter- 
mine recursively different communities in large networks. 

There are other centrality indices based on shortest 
paths linking pairs of nodes (stress, closeness, or graph 
centrality [8,9]). In order to take into account the fact 
that shortest paths are not always relevant, other defini- 
tions were introduced such as the flow betweenness [11] 
and recently a betweenness centrality based on random 
walks [12]. This definition (1) differs from the following 
one which includes the paths endpoints s and t 
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(s or t can be v). It can be easily checked that 
= 2(N-l)+g(v) 
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This additional term 2(N — 1) is sub-dominant since 
g ~ 0(N 2 ) and is thus negligible in the limit of large 
networks leading to the same results for both definitions 
(for a typical value of the order N = 10 4 , the relative 
difference for large connectivities is negligible — of order 
10 -4 — but could be larger for lower k). In this work, 
we use the definition (1) and restrict ourselves to non- 
weighted and non-directed graphs. We will rescale the 
BC by (N -1)(N- 2)/2 so that g e [0, 1]. We will keep 
the same notation g for this normalized centrality. 



II. CENTRALITY AND CONNECTIVITY 

It has been observed [13] that large networks can be 
essentially classified in two categories according to the 
decay of the connectivity distribution P{k). The first 
category comprises the "exponential" networks with a 
connectivity distribution decaying faster than any power 
law (random graph, Poisson graph, etc). In contrast, the 



second category is constituted by the "scale-free" net- 
works which have a probability distribution decaying as 
a power law characterized by an exponent 7 



p(k) ~ r 



(9) 



For these networks, there are no typical nodes since the 
connectivity can vary over a large range of values. In 
this sense, scale-free networks are very heterogeneous 
compared to exponential networks for which connectivity 
fluctuations are small. 

In the following, we will investigate the BC for net- 
works which are simple models representative of each 
class. 



A. Scale-Free Networks 

In the case of scale-free networks, Goh et al have pre- 
sented a numerical study of the BC (or "load") distribu- 
tion in a static scale- free network model [14]. For this 
scale-free model, the exponent 7 s]2, 00 [ is a tunable pa- 
rameter. They also studied the scale-free model obtained 
by preferential attachment [15] for which 7 = 3. They 
showed that the BC is distributed according to a power- 
law with exponent 6 [16] 
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This behavior holds for large g up to a cut-off value 
which is controlled by finite-size effects. On the basis of 
their numerical results, they conjectured that the value 
of S ~ 2.2 is "universal" for all values of 7 G]2, 3]. Univer- 
sality is usually invoked in physics when different systems 
show the same behavior [18]. For example many of the 
observed second order phase transitions have a behavior 
which depends only on the dimension of the system and 
the symmetry of the order parameter. In terms of the 
renormalization group, all these systems are described by 
the same fixed point of the renormalization group trans- 
formation and their critical exponents are then equal. 
In the case of networks, Goh et al [14,17] measured the 
exponent 5 for different real-world and in silico systems 
and found only two classes [17]: Either S ~ 2.2 (Class I) 
or S = 2 (Class II). According to these numerical find- 
ings, they claimed that there is "universality" and that 
networks could be classified according to the value of 5. 
This means that within a given class, 5 is independent of 
the details of the network such as the mean connectivity 
< k >= 2m, or the exponent 7. 

The value of 5 is however not universal [19] and varies 
significantly as 7 changes in the interval ]2,3] or as m 
varies. In order to see this non-universality, we first com- 
puted the cumulative function F(g) = Prob(BC > g) for 
the model proposed in [14] and for the scale-free network 
obtained by preferential attachment [15]. The results are 
shown on Fig. 2 and even if the variations are small, the 
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differences are significant enough to show that S varies. 
However, as it can be seen on this Fig. 2 for the BA case, 
the power law is screened by a cut-off which can be small 
due to finite-size effects. 



10" 



10" 



£10" 



10"' 
10" 





1 %N*Z , * 5 '*%^™ 


£ BA model 








■ y=2.5 




y=3 





10" 



10"' 

g 



10" 



FIG. 2. Cumulative function of the load for different values 
of 7 = 2, 2.5, and 3 (for m = 2). These results were obtained 
with the same values as in [14] N = 10 and for 10 config- 
urations. The power law fits (straight lines) give the values 
8 = 1.86, 2.01 and 2.23 while for the BA model 8 ~ 2.3. 

The variations of S obtained with F(g) are significant 
enough to claim that it is not a universal exponent but 
in order to double-check our results we can also use an 
indirect way of computing <5. We study the relation be- 
tween the load and the connectivity [14,20] which is of 
the form 



k 71 



(11) 



where the exponent 77 depends on the network. This re- 
lation (between two random variables) implies that for a 
given value of k, the corresponding value gk of the cen- 
trality is fixed. Due to noise such as finite-size effects, gk 
can however have small fluctuations and we compute the 
average of gk at fixed k. The result is shown on Fig. 3 
and as can be seen on this plot, the power law (11) holds 
remarkably for a large range of k and allows an accurate 
measure of rj. In addition, this relation (11) enables us 
to estimate the cut-off value above which the power-law 
(10) does not hold. Indeed, the maximum connectivity 
scales as [21] k c ~ iV 1 ^ 7-1 ) which thus implies that the 
maximum BC scales as g c ~ iV''/^ 7 "" 1 ). Finally, we also 
checked that the value of 77 does not change significantly 
for different values of the system size: For 7 = 2.5, we 
obtain r}(N = 10 4 ) = 1.461 ± 0.005, r)(N = 2.10 4 ) = 
1.467 ± 0.006, and rj(N = 5.10 4 ) = 1.467 ± 0.006 which 
represents a relative variation due to size less than 1%). 

The exponents 77 and 8 are not independent since 
Eq. (11) implies that 



P( g ) = / dkP(k)6(g - k 71 ) 



(12) 



which for large g implies a large k and 

P(g » 1) ~ J dkk-^5(g - k r ') 



~ 9 

which proves the following equality [20] 
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If the value of S ~ 2.2 is universal then 77 is a linear 
function of 7 with slope ~ 1/1.2 ~ 0.83. 

10° 
10" 1 
10" 2 
00 10" 3 
10" 4 
10" 5 
10" P 




10" 



10' 



10 

k 



10° 



10' 



FIG. 3. Log-Log plot of the normalized average load ver- 
sus connectivity for the same models as in [14] with m — 2. 
The power law fits (straight lines) give 77 = 1.27 ± 0.01 
(N = 3.10 4 ), 1.467 ± 0.006 (N = 5.10 4 ), and 1.68 ± 0.02 
(TV = 5.10 4 ) for 7 = 2, 2.5, and 3 respectively. For the BA 
model, 77 = 1.81 ± 0.02 (N = 5.10 4 ). 

In Fig. 4 we plot the measured 77 versus 7 for the dif- 
ferent types of networks studied and the corresponding 
value predicted by universality. This Fig. 4 shows that 
if for 7 ~ 3 the value 5 = 2.2 seems to be acceptable, 
the claim of universality for 7 €]2,3] proposed in [14] 
does not hold (our results do not fit in the other class 
S = 2.0 either). In addition, we tested the universal- 
ity for different values of m and we also obtain varia- 
tions ruling it out: For 7 = 2.5 and for N = 2.10 4 , we 
obtain 77 = 1.477 ± 0.006, 1.56 ± 0.006, and 1.64 ± 0.01 
for m — 2,4,6 respectively. Even if Goh et al have re- 
cently shown [22] with a variant of the BA model that for 
me [1,2], the exponent 8 is close to 2.2 for other models 
supposed to be within the same universality class (BA 
model, static model, etc.), the exponent 6 varies with m 
or 7 and is therefore not universal. 

We also note in Figure 4 that for larger values of 7, the 
exponent rj seems to converge to the value 77 = 2. This 
seems to show that for an exponential network, formally 
characterized by 7 = 00, the exponent 77 is equal to two. 
We will discuss this fact in more details below. 

Finally, the case in = 1 for the preferential attach- 
ment is special in the sense that the obtained scale-free 
network is a tree. Exact calculations in this case [23,17] 
show that 5 = 2 = 77. We will see below that the value 
77 = 2 is in fact expected for any tree and that S — 2 is 
the expected value for a scale- free tree only with 7 = 3. 
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FIG. 4. Exponent rj versus 7. If the universality proposed 
in [14] would be correct, the measured values for 7 G [2, 3) 
should lie on the "universal" straight line corresponding to 
8 = 2.2 (class I). 



B. Random graph 

We have seen different examples of scale-free networks 
in the previous section and we focus now on the random 
graph [24,25] (often called Erdos-Renyi graph) which is 
a typical example of exponential networks for which the 
connectivity distribution is decaying at least as fast as 
an exponential. This network is constructed as follows. 
Starting from iV nodes, one connects with probability p 
each pair of nodes. The average final number of edges is 
thus E — pN(N — l)/2 and the average connectivity is 
2E/N = p(N—l) ~ pN for large graphs. More generally, 
the probability that a node has connectivity k is given by 
the Binomial law 

pw = i N k 1 )^o--p) N ~ 1 ~ h (15) 

which converges to a Poisson law of parameter < k > for 
large N and small p such that < k >= pN is fixed. We 
studied the centrality for this network and in Fig. 5 we 
plot the measured BC versus the connectivity. Even if 
the connectivity is not varying over a very large range, 
this plot shows that for large k we have rj = 2. We will 
discuss this result in more details below but we already 
note that the random graph has a very small clustering 
coefficient C ~ 1/N (C counts the average fraction of 
pairs of connected neighbors [26]) and that this property 
could possibly be related to the fact that r\ = 2. 
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FIG. 5. Log-Log plot of the normalized average load versus 
connectivity for the random graph model with N = 5.10 4 and 
< k >— 6). The straight line is of slope rj — 2. 

III. DISCUSSION AND ANALYSIS OF THE 
RESULTS 

The results obtained above show that the exponents ij 
and 5 are not universal and depend on the details of the 
network. In particular, if the network is scale-free (tree- 
like or not) 5 depends on the exponent 7 which describes 
the power law decay of the connectivity distribution. 

The important exponent appears to be rj which de- 
scribes how the betweenness centrality depends on the 
connectivity. The "optimal" situation which maximizes 
the BC for a vertex is obtained when all shortest paths 
are going through it, which happens for a tree structure 
(ie. a network without loops) . To this optimal tree situ- 
ation corresponds the maximum value of 77 = 2. In order 
to show this, we first define some objects. If a vertex v 
has connectivity fc, we denote by Vi (i = 1, . . . , k) its k 
neighbors. Each neighbor Vi defines a "neighborhood" 
Ci constituted by nodes which are closer to this neighbor 
than to any other one. More formally, Ci is defined as 
follows 

Ci = {s d(s, < d(s, Vj) Vj ^ i} (16) 

When the equality of distances d{s,Vi) = d(s,Vj) is ob- 
tained for some j then the node s belongs to the two 
neighborhoods Ci and Cj . The existence of a non empty 
intersection between different neighborhoods allows for 
the possibility of paths by-passing the node v. 

In the following we denote by JVj the size of each re- 
gion Ci. In general the shortest paths from s G Ci to 
t G Cj go through v or avoid v by using paths on nodes 
belonging to Ci (~l Cj [see Fig. 6] . If the two nodes s and t 
belong to the same neighborhood, say Ci, there is always 
a shortest path within C; (in the worst case the shortest 
path goes through vi but not through v) and therefore 

M*>)=0 if s,t e Ci (17) 

In terms of these neighborhoods Q, the BC can be 
rewritten as 

g(v) = (18) 

= E E m*o (19) 

ij&j seCi,teCj 

(the term i = j gives zero). 

For a tree, these regions Ci are disconnected one from 
the other and the BC can then be rewritten as 

'/('•* -E V ' V / ( 2 °) 
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If in addition these different parts are of the same order 
of magnitude Ni ~ No (which is similar to a statistical 
isotropy condition) we obtain 

g(v) ~ N 2 k(k - 1) (21) 

which for large k behaves as k 2 leading to the value r\ = 2. 
Obviously, the "isotropy" condition Ni ~ const, is neces- 
sary and if it is not satisfied then the preceding argument 
does not apply [27]. We note that an exactly solvable 
model for which this assumption is satisfied is the tree 
graph obtained with the BA model with m = 1 and where 
one indeed finds n = 2 [17]. The tree situation miximizcs 
the BC since all shortest paths are going through the 
node v. In any other cases, the centrality will be less and 
the maximum possible value of 77 is 2. More generally, if 
for a network the density of loops is small enough such 
that most shortest paths which go from d to Cj have to 
go through v then we obtain 77 = 2. This is the case for 
trees but also for random graphs for which the clustering 
is small ~ 1/N. 




FIG. 6. The node v has here 3 neighbors vi,V2,V3. These 
neighbors define three different regions which are discon- 
nected in the case of a tree. When the intersection of these 
regions is not empty, (shortest) paths between these regions 
which by-pass v can exist (and are represented by the dotted 
line between the regions Cj). 

If in addition to be a tree, the network is scale-free we 
can use the relation (14) which together with 77 = 2 leads 
to 

V = 2 => S = 2+1 (22) 

This relation in particular implies that for the scale- 
free BA network with m = 1 and 7 = 3, we obtain 
8 = (7 + l)/2 = 2 in agreement with previous results 
[23,17]. It should be noted that in both these papers 
[23,17] the authors demonstrate that 8 = 2 in the spe- 
cific case of preferential attachment. However, in [17], 
the authors claim that their result is valid for any scale- 
free tree with 7 > 2. This is an incorrect statement since 
their derivation is only valid for preferential attachment 
and in general 8 depends on 7 as predicted by Eq. (22). 



On the other hand — and this is the second possible 
category of networks — if there is a significant fraction of 
shortest paths which by-pass v then the exponent 77 will 
be less than 2. If the network is scale- free then we can 
use the relation (14) which together with 77 < 2 leads to 
the exact bound 

77 < 2 => 5 > 2±1 (23) 

The quantity 2 — r\ is thus a measure of the density of 
loops in the network. The fact that 77 < 2 indicates that 
the different parts are also connected by shortest paths 
which do not pass through the central node. More gener- 
ally, it would be interesting to understand how 77 depends 
on the different parameters of the network such as 7, the 
clustering coefficient, the loop density, the "anisotropy" , 
or any other correlation function. 

In summary, it seems that concerning the between- 
ness centrality, we can distinguish two main categories. 
For the first one which comprises the trees and tree-like 
networks (clustering almost zero, density of loops very 
small), we have 77 = 2. If in addition, the tree is scale- 
free with exponent 7, we have the relation 8 = (7+ l)/2. 
The second category comprises the networks for which 
the density of loops is large enough so that the networks 
are very different from trees. In this case, the exponents 
8, 77 — when they exist — are not universal and depend on 
the different details (average connectivity, correlations, 
etc). If this "clustered" network is scale-free with ex- 
ponent 7, the exponent 8 must obey an exact bound 
[Eq. (23)]. Although we believe that the present picture 
is the correct one, further studies are still necessary to 
understand which are exactly the parameters which con- 
trol the behavior of 77. In this respect, analytical insights 
would be particularly valuable. 
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